This title is a bit provocative, yet strictly speaking it is correct. We will show two examples where we realise what is missing from GAMMs in order to become tools for time series analysis. In short, we need to communicate to the model something about the order of the time samples, which we have not done so far.
Consider the data below:
Let’s estimate a GAM:
m1 <- bam(y ~ s(time, k = 20), data = curves)
summary(m1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1795848 0.0004607 389.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 15.52 17.6 6060 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.915 Deviance explained = 91.5%
## fREML = -16489 Scale est. = 0.0021116 n = 9950
plot_smooth(m1, view = "time", col = "slateblue4", print.summary = FALSE, rug = FALSE)
Now, imagine that the same data set was not a collection of time
series. For example instead of time we could have
price of houses (in millions of euros), and y
could be some measure of demand. The plot would be like this:
Question: How would you change the GAM above to reflect the fact that there are no curves but it is all one data set?
gam.check(m1)
##
## Method: fREML Optimizer: perf newton
## full convergence after 9 iterations.
## Gradient range [-6.31743e-06,6.479967e-06]
## (score -16488.76 & scale 0.002111591).
## Hessian positive definite, eigenvalue range [8.040793,4974.011].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 19.0 15.5 0.99 0.23
We note a strong evidence for heteroscedasticity. Where does it come from?
Look at the variation in the large peak. When y is high
we are around the peak, and that is also where the largest variation
across curves occurs, hence the wider residual span for higher fitted
values.
Now let’s consider one particular curve, say one whose first peak is
higher than the mean. As we see from the raw curves, we expect that if
the residual error for this particular curve at say
time = 0.4 is positive, it will also be positive in the
next time samples. Hence neighbouring residuals are
correlated, which violates one of the hypotheses of the model.
We would like to express this fact in the GAMM.
A way to alleviate the problem above is to induce a structure in the residuals as follows: \[ \epsilon_i = \rho \cdot \epsilon_{i-1} + \psi_i \] which means that the residual at point \(i\) is proportional to the residual at point \(i-1\), i.e. one step earlier on the time axis, plus another unknown term \(\psi_i\), the latter values distributed as \(N(0, \sigma^2)\) and independent. The GAMM will not estimate the parameter \(\rho\) for us directly, rather we meed to set it ourselves. This type of sub-model for the noise is well known in signal processing and it’s called AR1, i.e. auto regressive model of order 1 (because it only depends on one step in the past).
What we typically do is this:
The following estimates \(\rho\) and plots the complete autocorrelation function for the residuals:
m1.acf <- acf_resid(m1)
This function estimates the correlation of residuals with themselves in the past at different lags. We take the value as lag 1 as our estimate for \(\rho\)
rho <- m1.acf[2]
# at index 1 we have lag 0, whose value is 1 by definition
# at index 2 we have lag 1, which is what we need
rho
## 1
## 0.8045908
Now we re-run the GAMM. We set \(\rho\) by specifying the argument
rho. We also need to indicate where the start of curves in
the data are, otherwise the AR1 model will be also applied between the
last and the first sample of curves (argument
AR.start).
m1.ar1 <- bam(y ~ s(time, k = 20), data = curves,
rho = rho, AR.start = curves$time == 0
)
summary(m1.ar1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.179576 0.001376 130.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 12.43 15.13 804.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.915 Deviance explained = 91.5%
## fREML = -21706 Scale est. = 0.0020926 n = 9950
plot_smooth(m1.ar1, view = "time", col = "slateblue4", rug = FALSE, print.summary = FALSE)
acf_resid(m1.ar1)
gam.check(m1.ar1)
##
## Method: fREML Optimizer: perf newton
## full convergence after 7 iterations.
## Gradient range [-1.51787e-05,1.483733e-05]
## (score -21705.86 & scale 0.002092649).
## Hessian positive definite, eigenvalue range [6.899551,4974.007].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 19.0 12.4 1.04 1
The results show that:
The way to explicitly represent the information about curves, i.e. which sample belongs to which curve, is to introduce a random smooth factor at the curve level:
m1.randCurves <- bam(y ~ s(time, k = 20) +
+ s(time, curveId, bs = "fs", m=1, k = 15)
, nthreads = 2
, data = curves)
summary(m1.randCurves)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20) + +s(time, curveId, bs = "fs", m = 1, k = 15)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.179574 0.004535 39.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 17.69 18.6 671.54 <2e-16 ***
## s(time,curveId) 505.63 749.0 56.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.984 Deviance explained = 98.5%
## fREML = -24077 Scale est. = 0.00039985 n = 9950
plot_smooth(m1.randCurves, view = "time", col = "slateblue4", rug = FALSE, print.summary = FALSE)
op <- par(mfrow=c(2,2))
gam.check(m1.randCurves)
##
## Method: fREML Optimizer: perf newton
## full convergence after 7 iterations.
## Gradient range [-1.198123e-07,1.173967e-07]
## (score -24076.94 & scale 0.0003998499).
## Hessian positive definite, eigenvalue range [8.57444,4985.08].
## Model rank = 770 / 770
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 19.0 17.7 1 0.43
## s(time,curveId) 750.0 505.6 1 0.42
acf_resid(m1.randCurves)
par(op)
The results show that:
As alternative to the mgcv library, you can try the sparseFLMM
library, which incorporates curve-level random smooths by default and it
performs an efficient estimation based on functional PCA.
sparseFLMM is (in my opinion) generally less user-friendly
than mgcv.
This is a curve dataset with a 4-level factor F4. An AR1
residual with coefficient \(\rho =\)
0.9 is added to the 4 expected curves.
Let us first model it as:
y ~ F4 + s(t, by = F4), i.e. a
regular GAM with one factor and no AR1 residual.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F4 + s(t, by = F4, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.503876 0.001413 356.636 < 2e-16 ***
## F4A.2 0.482954 0.001998 241.709 < 2e-16 ***
## F4B.1 -0.011297 0.001998 -5.654 1.68e-08 ***
## F4B.2 0.455562 0.001998 228.000 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F4A.1 8.638 8.962 12669 <2e-16 ***
## s(t):F4A.2 8.816 8.990 14264 <2e-16 ***
## s(t):F4B.1 8.551 8.942 12218 <2e-16 ***
## s(t):F4B.2 8.889 8.996 6972 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.992 Deviance explained = 99.2%
## fREML = -6733.5 Scale est. = 0.0020361 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-2.764712e-06,2.764141e-06]
## (score -6733.482 & scale 0.002036088).
## Hessian positive definite, eigenvalue range [3.777704,2036.029].
## Model rank = 40 / 40
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F4A.1 9.00 8.64 1.07 1
## s(t):F4A.2 9.00 8.82 1.07 1
## s(t):F4B.1 9.00 8.55 1.07 1
## s(t):F4B.2 9.00 8.89 1.07 1
Notice that:
Let us add the AR1 term to the model, taking the estimated value \(\hat{\rho} =\) 0.8851126:
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F4 + s(t, by = F4, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.503812 0.004994 100.876 <2e-16 ***
## F4A.2 0.483003 0.007063 68.382 <2e-16 ***
## F4B.1 -0.011101 0.007063 -1.572 0.116
## F4B.2 0.455781 0.007063 64.527 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F4A.1 8.654 8.974 1630 <2e-16 ***
## s(t):F4A.2 8.819 8.993 1889 <2e-16 ***
## s(t):F4B.1 8.561 8.958 1586 <2e-16 ***
## s(t):F4B.2 8.895 8.998 1494 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.992 Deviance explained = 99.2%
## fREML = -10010 Scale est. = 0.0018469 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 6 iterations.
## Gradient range [-5.816972e-05,5.753168e-05]
## (score -10009.53 & scale 0.001846912).
## Hessian positive definite, eigenvalue range [3.784681,2036.029].
## Model rank = 40 / 40
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F4A.1 9.00 8.65 1.07 1
## s(t):F4A.2 9.00 8.82 1.07 1
## s(t):F4B.1 9.00 8.56 1.07 1
## s(t):F4B.2 9.00 8.89 1.07 1
Notice that:
Let us now change two things:
A and BLet us model this dataset as: y ~ F2 + s(t, by = F2)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F2 + s(t, by = F2, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.745345 0.006508 114.524 <2e-16 ***
## F2B -0.018165 0.009204 -1.974 0.0485 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F2A 5.541 6.686 752.7 <2e-16 ***
## s(t):F2B 6.539 7.675 435.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.673 Deviance explained = 67.4%
## fREML = 822.95 Scale est. = 0.086408 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-5.533282e-07,4.192454e-07]
## (score 822.9545 & scale 0.08640774).
## Hessian positive definite, eigenvalue range [2.154157,2038.006].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F2A 9.00 5.54 0.1 <2e-16 ***
## s(t):F2B 9.00 6.54 0.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that:
Let us blindly apply the AR1 recipe, i.e. take the estimated \(\hat{\rho} =\) 0.9911937 and re-fit the model:
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F2 + s(t, by = F2, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.74514 0.03829 19.458 <2e-16 ***
## F2B -0.01803 0.05416 -0.333 0.739
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F2A 8.194 8.870 193.5 <2e-16 ***
## s(t):F2B 8.610 8.969 254.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.672 Deviance explained = 67.4%
## fREML = -7751.7 Scale est. = 0.067817 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-2.848765e-09,2.66256e-09]
## (score -7751.663 & scale 0.06781707).
## Hessian positive definite, eigenvalue range [3.560744,2038.013].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F2A 9.00 8.19 0.1 <2e-16 ***
## s(t):F2B 9.00 8.61 0.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that:
Let us instead add a curve-specific random smooth term (and remove the AR1 term):
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F2 + s(t, by = F2, k = 10) + s(t, curveId, by = F2, bs = "fs",
## m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.74534 0.03882 19.202 <2e-16 ***
## F2B -0.01816 0.05356 -0.339 0.735
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F2A 8.023 8.139 73.92 <2e-16 ***
## s(t):F2B 8.592 8.645 112.93 <2e-16 ***
## s(t,curveId):F2A 348.851 359.000 1201.31 <2e-16 ***
## s(t,curveId):F2B 347.848 359.000 1184.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.998 Deviance explained = 99.9%
## fREML = -8195.9 Scale est. = 0.00040811 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 16 iterations.
## Gradient range [-1.997467e-05,1.62132e-05]
## (score -8195.885 & scale 0.0004081082).
## Hessian positive definite, eigenvalue range [3.384527,2063.451].
## Model rank = 1460 / 1460
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F2A 9.00 8.02 0.99 0.32
## s(t):F2B 9.00 8.59 0.99 0.33
## s(t,curveId):F2A 720.00 348.85 0.99 0.34
## s(t,curveId):F2B 720.00 347.85 0.99 0.30
Notice that: